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Abstract 

Eigenvector localization refers to the situation when most of the components of an eigenvec- 
tor are zero or near-zero. This phenomenon has been observed on eigenvectors associated with 

t-H extremal eigenvalues, and in many of those cases it can be meaningfully interpreted in terms 

of "structural heterogeneities" in the data. For example, the largest eigenvectors of adjacency 
matrices of large complex networks often have most of their mass localized on high-degree 
Qh nodes; and the smallest eigenvectors of the Laplacians of such networks are often localized on 

small but meaningful community-like sets of nodes. Here, we describe localization associated 
with low-order eigenvectors, i.e., eigenvectors corresponding to eigenvalues that are not ex- 
tremal but that are "buried" further down in the spectrum. Although we have observed it 
in several unrelated applications, this phenomenon of low-order eigenvector localization defies 
common intuitions and simple explanations, and it creates serious difficulties for the applica- 
bility of popular eigenvector-based machine learning and data analysis tools. After describing 

Uh two examples where low-order eigenvector localization arises, we present a very simple model 

that qualitatively reproduces several of the empirically-observed results. This model suggests 
certain coarse structural similarities among the seemingly-unrelated applications where we 
have observed low-order eigenvector localization, and it may be used as a diagnostic tool to 
help extract insight from data graphs when such low-order eigenvector localization is present. 

> 

^ 1 Introduction 

The problem that motivated the work described in this paper had to do with using eigenvector- 
based methods to infer meaningful structure from graph-based or network-based data. Methods of 
this type are ubiquitous. For example, Principal Component Analysis and its variants have been 
widely-used historically. More recently, nonlinear-dimensionality reduction methods, spectral 
partitioning methods, spectral ranking methods, etc. have been used in increasingly-sophisticated 
^ ways in machine learning and data analysis. 

Although they can be applied to any data matrix, these eigenvector-based methods are gen- 
erally most appropriate when the data possess some sort of linear redundancy structure (in the 
original or in some nonlinearly-transformed basis) and when there is no single data point or no 
small number of data points that are particularly important or influential [H]. The presence of 
linear redundancy structure is typically quantified by the requirement that the rank of the matrix 
is small relative to its size, e.g., that most of the Frobenius norm of the matrix is captured by 
a small number of eigencomponents. The lack of a small number of particularly-influential data 
points is typically quantified by the requirement that the eigenvectors of the data matrix are 
delocalized. For example, matrix coherence and statistical leverage capture this idea [8]. 

Localization in eigenvectors arises when most of the components of an eigenvector are zero or 
near-zero [15j . (Thus, eigenvector derealization refers to the situation when most or all of the 
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components of an eigenvector are small and roughly the same magnitude. Below we will quantify 
this idea in two different ways.) While creating serious difficulties for recently-popular eigenvector- 
based machine learning and data analysis methods, such a situation is far from unknown. Typi- 
cally, though, this phenomenon occurs on eigenvectors associated with extremal eigenvalues. For 
example, the largest eigenvectors of adjacency matrices of large complex networks often have 
most of their mass localized on high-degree nodes jjj. Alternatively, the smallest eigenvectors of 
the Laplacian of such networks are often localized on small but meaningful community-like sets 
of nodes [T7]. More generally, this phenomenon arises on extremal eigenvectors in applications 
where extreme sparsity is coupled with randomness or quasi-randomness jT2j Q21 El EI] • In these 
cases, as a rule of thumb, the localization can often be interpreted in terms of a "structural 
heterogeneity," e.g., that the degree (or coordination number) of a node is significantly higher or 
lower than average, in the data [121 113 [HI EI]. 

In this paper, the phenomenon of localization of low-order eigenvectors in Laplacian matrices 
associated with certain classes of data graphs is described for several real-world data sets and 
analyzed with a simple model. By low-order eigenvectors, we mean eigenvectors associated with 
eigenvalues that are not extremal (in the sense of being the largest or smallest eigenvalues) , but 
that are "buried" further down in the eigenvalue spectrum of the data matrix. As a practical 
matter, such localization is most interesting in two cases: first, when it occurs in eigenvectors 
that are below, i.e., associated with smaller eigenvalues than, other eigenvectors that are sig- 
nificantly more delocalized; and second, when the localization occurs on entries or nodes that 
are meaningful, e.g., that correspond to meaningful clusters or other structures in the data, to a 
downstream analyst. 

We have observed this phenomenon of low-order eigenvector localization in several seemingly- 
unrelated applications (including, but not limited to, the Congress and the Migration data 
discussed in this paper, DNA single-nucleotide polymorphism data, spectral and hyperspectral 
data in astronomy and other natural sciences, etc.). Moreover, based on informal discussions with 
both practitioners and theorists of machine learning and data analysis, it has become clear that 
this phenomenon defies common intuitions and simple explanations. For example, the variance 
associated with these low-order eigenvectors is much less than the variance associated with "ear- 
lier" more-delocalized eigenvectors. Thus, these low-order eigenvectors must satisfy the global 
requirement of exact orthogonality with respect to all of the earlier delocalized eigenvectors, and 
they must do so while keeping most of their components zero or near-zero in magnitude. This 
requirement of exact orthogonality is responsible for the usefulness of eigenvector-based methods 
in machine learning and data analysis, but it often leads to non-interpretable vectors — recall, e.g., 
the characteristic "ringing" behavior of eigenfaces associated with low-order eigenvalues [361 E3] 
as well as the issues associated with eigenvector reiflcation in the natural and social sciences [19j. 
For this and related reasons, it is often the case that by the time that most of the variance in the 
data is captured, the residual consists mostly of relatively-delocalized noise. Indeed, eigenvector- 
based models and methods in machine learning and data analysis typically simply assume that 
this is the case. 

In this paper, our contributions are threefold: first, we will introduce the notion of low-order 
eigenvector localization; second, we will describe several examples of this phenomenon in two real 
data sets; and third, we will present a very simple model that qualitatively reproduces several of 
the empirical observations. Our model is a very simple two-level tensor product construction in 
which each level can be "structured" or "unstructured." Aside from demonstrating the existence 
of low-order eigenvector localization in real data, our empirical results will illustrate that mean- 
ingful very low variance parts of the data can — in some cases — be extracted in an unsupervised 
manner by looking at the localization properties of low-order eigenvectors. In addition, our simple 
model will suggest certain coarse structural similarities among seemingly-unrelated applications, 
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and it may be used as a diagnostic tool to help extract meaningful insight from real data graphs 
when such low-order eigenvector localization is present. We will conclude the paper with a brief 
discussion of the implications of our results in a broader context. 

2 Data, methods, and related work 

In this section, we will provide a brief background on two classes of data where we have observed 
low-order eigenvector localization, and we will describe our methods and some related work. 

2.1 The two data sets we consider 

The main data set we consider, which will be called the CONGRESS data set, is a data set of roll 
call voting patterns in the U.S. Senate across time [26,[38l|22]. We considered Senates in the 70 th 
Congress through the 110 th Congress, thus covering the years 1927 to 2008. During this time, the 
U.S. went from 48 to 50 states, and thus the number of senators in each of these 41 Congresses 
was roughly the same. After preprocessing, there were n — 735 distinct senators in these 41 
Congresses. We constructed an n x n adjacency matrix A, where each Aij E [0, 1] represents the 
extent of voting agreement between legislators i and j, and where identical senators in adjacent 
Congresses are connected with an inter- Congress connection strength. We then considered the 
Laplacian matrix of this graph, constructed in the usual way; see [38J for more details. 

We also report on a data set, which we will call the Migration data set, that was recently con- 
sidered in [10]. This contains data on county-to-county migration patterns in the U.S., constructed 
from the 2000 U.S. Census data, that reports the number of people that migrated from every 
county to every other county in the mainland U.S. during the 1995-2000 time frame fT| 1301 ITU]. 
We denote by M = (Mij)i<ij<jsr the total number of people who migrated from county i to 
county j or from county j to county i (so Mij = Mji), where N = 3107 denotes the number of 
counties in the mainland U.S.; and we let Pi denote the population of county i. We then build the 

similarity matrix Wij — jrp: and the diagonal scaling matrix Da = X^Li w ij'i an d we considered 
the usual random walk matrix, D^W, associated with this graph. We refer the reader to [10] 
for a discussion of variants of this similarity matrix. 

2.2 The methods we will apply 

In both of these applications, we will look at eigenvectors of matrices constructed from the data 
graph. Recall that given a weighted graph G — (V,E,W), one can define the Laplacian matrix 
as L = D — W, where W is a weighted adjacency matrix, and where D is a diagonal matrix, 
with i th entry Da equal to the degree (or sum of weights) of the i th node. Then consider the 
solutions to the generalized eigenvalue problem Lx = XDx. These are related to the solutions 
of the eigenvalue problem Px — Ax, where P — D~ X W is a row-stochastic matrix that can be 
interpreted as the transition matrix of a Markov chain with state space equal to the nodes in 
V and where Pij represents the transition probability of moving from node i to node j in one 
step. In particular, if (A, x) is an eigenvalue-eigenvector solution to Px — Ax, then (1 — A, x) is a 
solution to Lx — XDx. 

The top (resp. bottom) eigenvectors of the Markov chain (resp. generalized Laplacian eigen- 
value) problem define the coarsest modes of variation or slowest modes of mixing, and thus these 
eigenvectors have a natural interpretation in terms of diffusions and random walks. As such, 
they have been widely-studied in machine learning and data analysis to perform such tasks as 
partitioning, ranking, clustering, and visualizing the data [3T1 [29l [28l EHl SI E] • We are interested 
in localization, not on these top eigenvectors, but on lower-order eigenvectors — for example, on 
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the 4:1 st eigenvector (or 43 rd or . . . out of a total of hundreds of eigenvectors) in the CONGRESS 
data below. (As a matter of convention, we will refer to eigenvectors that are associated with 
eigenvalues that are not near the top part of the spectrum of the Markov chain matrix as low- order 
eigenvectors — thus, they actually correspond to larger eigenvalues in the generalized eigenvalue 
problem Lx = XDx.) 

We will consider several measures to quantify the idea of localization in eigenvectors as arising 
when most of the components of an eigenvector are zero or near-zero. Perhaps most simply, we 
will consider histograms of the entries of the eigenvectors. More generally, let V be a matrix 
consisting of the eigenvectors of P = D^W, ordered from top to bottom; let denote the j th 
eigenvector and Vij the i th element of this j th vector; and let J\f — Y17=l ^ij ~ 1- Then: 

• Then the j -componentwise-statistical-leverage (CSL) of node j is an n-dimensional vector 
with i th element given by Vfi/Af. Thus, this measure is a score over nodes that describes 
how localized is a given node along a particular eigendirection. 

• The j -inverse participation ratio (IPR) is the number J^Li V^/A/". Thus, this measure is 
a score over eigendirections that describes how localized is a given eigendirection. 

To gain intuition for these two measures, consider their behavior in the following limiting cases. If 
the j th eigenvector is (l/y^n, . . . , i.e., is very delocalized everywhere, then every element 

of the j-CSL is 1/n, and the j-LPR is \jn. That is, they are both "small." On the other hand, if 
the j th eigenvector is (1,0,..., 0), i.e., is very localized, then the j-CSL is (1, 0, . . . , 0), and the 
j-IPR is 1. Thus, for both measures, higher values indicate the presence of localization, while 
smaller values indicate derealization. 

2.3 Related work in machine learning and data analysis 

The j-CSL is based on the idea of statistical leverage, which has been used to characterize 
localization on the top eigenvectors in statistical data analysis [19] ; while the j-IPR originated in 
quantum mechanics and has been applied to study localization on the top eigenvectors of complex 
networks [12] . Depending on whether one is considering the adjacency matrix or the Laplacian 
matrix, localized eigenvectors have been found to correspond to structural inhomogeneities such as 
very high degree nodes or very small cluster-like sets of nodes [121 113 El EH E] • More generally, 
localization on the top eigenvectors often has an interpretation in terms of the "centrality" or 
"network value" of a node [7J , two ideas which are of use in applications such as viral marketing 
and immunizing against infectious agents. Localization on extremal eigenvectors has also found 
application in a wide range of problems such as distributed control and estimation problems [3] 
as well as asymptotic space localization in sensor networks [15] . 

There have been a great deal of work on clustering and community detection that rely on 
the eigenvectors of graphs. Much of this work finds approximations to the best global partition 
of the data [271 EHl EH E3J- More recent work, however, has focused on local versions of the 
global spectral partitioning method [32j O [18] ; and this work can be interpreted as partitioning 
with respect to a locally-biased vector computed from a locally-biased seed set. Random walks 
have been of interest in machine learning and data analysis, both because of their usefulness 
in nonlinear dimensionality reduction methods such as Laplacian Eigenmaps and the related 
diffusion maps [U El [5] as well as for the connections with spectral methods more generally [391 
[201 ESI E3- O ne line of work related to this but from which ours should be differentiated has to 
do with looking at the smallest eigenvectors of a graph Laplacian j2H [35] . These eigenvectors are 
not "buried" in the middle of the spectrum — they are associated with extremal eigenvalues and 
they typically have to do with identifying bipartite structure in the graph. 



4 



There is a large body of work in mathematics and physics on the localization properties 
of the continuous Laplace operator, nearly all of which studies the localization properties of 
eigenfunctions associated with extremal eigenvalues, and there is also a rich literature on the 
relationship between the spectrum and the geometry of the domain. Only recently, however, has 
work advocated studying localized eigenfunctions associated to lower-order eigenvalues [15] . Also 
recently, it was noticed that low-order localization exists in two spatially-distributed networks 
(the Migration data we report on here and a data set of mobile phone calls between cities in 
Belgium) and that this localization correlated with geographically-meaningful regions fTO] . 



3 Motivating empirical results 

In this section, we will illustrate low-order eigenvector localization for the two data sets described 
in Section [2| and we will show that in both cases the localization highlights interesting properties 
of the data. 



3.1 Overview of empirical results 

To start, consider Figure[IJ This figure illustrates the IPR for several toy data sets, for CONGRESS 
for several values of the connection parameter, and for Migration. In each case, the IPR is plot- 
ted as a function of the rank of the corresponding eigenvector. Figure 1 (a) | shows this plot for a 



discretization of a two-dimensional grid; and Figures l(b)| and l(c)| show this plot for a not-too 



sparse G np random graph, where G np refers to the Erdos-Renyi random graph model on some 
number n of nodes, where p is the connection probability between each pair of nodes [6J. These 
toy synthetic graphs represent limiting cases where measure concentration occurs and where de- 
localized eigenvectors are known to appear. More generally, the same derealization holds for dis- 
cretizations of other low-dimensional spaces, as well as low-dimensional manifolds under the usual 
assumptions made in machine learning, i.e., those without bad "corners" and without pathologi- 
cal curvature or other pathological distributional properties. Not surprisingly, similar results are 
seen for other similar toy data sets that have been used to validate eigenvector-based algorithms 
in machine learning. Two things should be noted about these results: first, even in these idealized 
cases, the IPR is not perfectly uniform, even for large values of the rank parameter, although the 
nonuniformity due to the random noise is relatively modest and seemingly- unstructured; and sec- 
ond, when the data are sparser, e.g., when the connection probability p is smaller in the random 
graph model, the nonuniformity due to noise is somewhat more pronounced. 

Next, Figures |T(d)| |l(e)[ |l(f)[ and |l(g)| illustrate the IPR for the CONGRESS data for several 



different values of the parameter defining the strength of interactions between successive Con- 
gresses, and Figure 1(h) [ illustrates the IPR for the Migration data. In all these cases, the IPR 



indicates that many of the low-order eigenvectors are significantly more localized than earlier 
eigenvectors. Moreover, the localization is robust in the sense that similar (but often noisier) 
results are obtained if the details of the kernel connecting different counties is changed or if the 
connection probability between individuals in successive Congresses is modified within reasonable 
ranges. This is most prominent in the CONGRESS data. For example, when the connection prob- 
ability is small, e.g., e = 0.1, as it was in the original applications [38l|22]. there is a significant 
localization-delocalization transition talking place between the 40 th and 41 st eigenvector. (The 
significance of this will be described below, but recall that the data consists of 41 Congresses. If 
the Congress data set is artificially truncated to consist of some number of Congresses other 
than 41, then this transition would have taken place at some other location in the spectrum, 
and we would have illustrated those eigenvectors.) Note, however, that when the connection 
parameter is increased from e = 0.1 to e = 1 and above, the low-order localization becomes much 
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(a) Two-dimensional grid, (b) Random 
G(ra,p = 0.01). 



graph, (c) Random 
G(n,p = 0.03). 
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graph, (d) CONGRESS e = 0.01 






(e) Congress, e = 0.1. (f) Congress, e = 1. (g) Congress, e = 10. (h) Migration data. 

Figure 1: Inverse Participation Ratio (IPR), as a function of the rank of the corresponding 
eigendirection, for several data graphs. For grids and other well-formed low-dimensional meshes 
as well as for not-extremely-sparse random graphs, all eigenvectors are fairly delocalized and 
the IPR is relatively flat. For Congress and Migration, there is substantial localization on 
low-order eigenvectors. 



less structured. In addition, unpublished results clearly indicate that in this case the structures 
highlighted by the low-order localization are much more noisy and much less meaningful to the 
domain scientist. 

3.2 The Congress data 

For a more detailed understanding of the localization phenomenon for the CONGRESS data (when 
e = 0.1), consider Figures [2j [3j and[4j Figure [2] presents a pictorial illustration of the top several 
eigenvectors and several of the lower-order eigenvectors. (Note that the numbering starts with the 
first nontrivial eigenvector.) These particular eigenvectors have been chosen to illustrate: the top 
three directions defining the coarsest modes of variation in the data; the three eigenvectors above 
and the three eigenvectors below the low-order localization-delocalization transition; and three 
eigenvectors further down in the spectrum. The first three eigenvectors are fairly delocalized and 
exhibit global oscillatory behavior characteristic of sinusoids that might be expected for data that 
"looked" coarsely one-dimensional. Eigenvectors 38 to 40 are quite far down in the spectrum; 
interestingly, they exhibit some degree of localization, perhaps more than one would naively 
expect, but are still fairly delocalized relative to subsequent eigenvectors. Starting with the 41 5t 
eigenvectors, and continuing with many more eigenvectors that are not illustrated, one sees a 
remarkable transition — although they are quite far down in the spectrum, these eigenvectors 
exhibit a remarkable degree of localization, very often on a single Congress or a few temporally- 
adjacent Congresses. (Note that in these and other figures the Y-axis is often different from 
subflgure to subflgure. While creating difficulties for comparing different plots, the alternative 
would involve losing the resolution along the Y-axis for all but the most localized eigenvectors.) 
Figure [3] shows the SLS for these twelve eigenvectors, and Figure [4] shows a histogram of the entries 
for each of these twelve eigenvectors. By both of these measures, very pronounced localization is 
clearly observed, complementing the observations in the previous figure. 
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Figure 2: The CONGRESS data: illustration of several of the eigenvectors, when the inter- Congress 
coupling is set to e = 0.1. (Recall that the X-axis essentially corresponds to time.) Shown are 
the top eigenvectors and several of the lower-order eigenvectors that exhibit varying degrees of 
localization. 

As an illustration of the significance of the structure highlighted by these low-order eigenvec- 
tors, note that only 0.73% of the spectrum is captured by the 41 st eigenvector and that over 99.9% 
of the (L2) "mass" (and 92.3% of the LI mass) of 41 st eigenvector is on individuals who served 
m the 108 th Con gress. Similarly, only 0.42% of the spectrum is captured by the 43 rd eigenvector 
and 98.5% of the (L2) "mass" (and 71.7% of the LI mass) of 43 rd eigenvector is on individuals 
who served in the 106 th Congress. Similar results are seen for many (but certainly not all) of the 
low-order eigenvectors. That is, in many cases, although these low-order eigenvectors account 
for only a small fraction of the variance in the data, they are often strongly localized on a single 
Congress (or, as Figure [2] illustrates, a small number of temp or ally- adjacent Congresses), i.e., 
at a single time step of the time series of voting data. In part because of this, these low-order 
eigenvectors can in some cases be used to perform common machine learning and data analysis 
tasks. 

Consider, for example, spectral clustering, which involves partitioning the data by performing 
a "sweep cut" over an eigenvector computed from the data. The first eigenvector shown in Figure|2] 
clearly illustrates that a sweep cut over the first nontrivial eigenvector of the Laplacian of the 
full data set will partition the network based on time, i.e., into a temporally-earlier cluster and 
a temporally-later cluster. Not surprisingly, low-order eigenvectors can highlight very different 
structures in the data. For example, by performing a sweep cut over the first nontrivial eigenvector 
of the Laplacian of the subnetwork induced by the nodes in the 110 th Congress, one obtains 
the same partition (basically, a partition along party lines [261 EEl E2]) as when the sweep cut 
is performed on the 41 st eigenvector of the Laplacian of the full data set. This is illustrated 
in Figure [5j Clearly, there is a strong correlation, as that low-order eigenvector is effectively 
finding the partition of the 110 th Congress into two parties. (Indeed, the color-coding in Figure^ 
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Figure 3: The CONGRESS data: the CSL scores of the eigenvectors that were shown in Figure [2j 
clearly indicating strong localization on some of the low-order eigenvectors. 



corresponds to party affiliation.) As a consequence, other clustering and classification tasks lead 
to similar or identical results, whether one considers the second eigenvector of the Laplacian of 
the subnetwork induced by the nodes in 110 th Congress or the 41 st eigenvector of the Laplacian 
of the full data set. Similar results hold for many of the other low-order eigenvectors, especially 
when the localization is very pronounced. 



3.3 The Migration data 

For a more detailed understanding of the localization phenomenon for the Migration data, 
consider Figures [6]and[7][10j. Fi gure [6] provides a pictorial illustration of the top eigenvectors as 
well as several of the lower-order eigenvectors of the county-to-county migration matrix. As with 
the Congress data, the Migration data demonstrates characteristic global oscillatory behavior 
on the the top three eigenvectors; and many of the low-order eigenvectors are fairly localized 
in way that seems to correspond to interesting domain-specific characteristics. In particular, 
some of the low order eigenvectors that localize very well seem to reveal small geographically 
cohesive regions that correlate remarkably well with political and administrative boundaries. In 
addition, Figure [7] shows a histogram of the entries for each of these eigenvectors, quantifying the 
degree of localization. Recent work on analyzing migration patterns using this data set highlight 
cosmopolitan or hub-like regions, as well as isolated regions that emerge when there is a high 
measure of separation between a cluster and its environment, some of which are discovered by the 
localization properties of low-order eigenvectors [30]. Our observations are also consistent with 
previous observations on the localization properties of the Migration data [10J. 

Clearly, in both the CONGRESS data and in the Migration data, there is more going on in the 
spectrum than we have discussed, and it is not obvious the extent to which these represent real 
properties of the data or are simply artifacts of noise. For example, there is a fairly strong tendency 
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Figure 4: The CONGRESS data: histograms of the entries of the eigenvectors that were shown in 
Figure [2j clearly indicating strong localization on some of the low-order eigenvectors. 

in the CONGRESS data for localization to occur on very early Congresses or very late Congresses; 
when this happens, there is a tendency for eigenvectors with localization on recent Congresses 
to account for a larger fraction of the variance of the data than eigenvectors with localization 
on much older Congresses.; etc. In addition, there are also many other low-order eigenvectors in 
these two data sets that are delocalized, noisy, and seemingly-meaningless in terms of the domain 
from which the data are drawn. We will discuss these and other issues below. Our point here 
is simply to illustrate that there can exist a substantial degree of localization on certain low- 
order eigenvectors; this this localization can highlight properties of the data — temporally-local 
information such as a party-line partition of a single Congress or small geographically cohesive 
regions that have experienced nontrivial migration patterns — of interest to the domain scientist; 
and that these properties are not highlighted among the coarsest modes of variation of the data 
when the data are viewed globally. 

4 A simple model 

In this section, we will describe a simple model that exhibits low-order eigenvector localization. 
This model qualitatively reproduces several of the results that were empirically observed in Sec- 
tion [3j and it can be used as a diagnostic tool to help extract insight from data graphs when such 
low-order eigenvector localization is present. 

4.1 Description of the TwoLevel model 

To motivate our TwoLevel model, consider what the CONGRESS data "looks like" if one 
"squints" at it, i.e., in "coarse-grained" sense. In this case, most edges are between different 
members of a single Congress, i.e., they are temporally-local at a single time-slice; and the re- 
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(a) Illustration of Congress in the (b) Normalized square spectrum, 
form of a "spy" plot. 



Republicans 
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(c) Partitioning based on the 41 s 
eigenvector. 



Figure 5: First panel: A "spy" plot of the CONGRESS data. The blocks on the diagonal correspond 
to the voting patterns in each of the 41 Congresses, and the off-diagonal entries take the value 
e = 0.1 when a single individual served in two successive Congresses. Second panel: Barplot of the 

A 2 

normalized square spectrum of the Congress matrix, i.e., ^ n i . 2 , for i = 1, . . . , 100, indicating 
that the low-order eigenvalues account for a relatively-small fraction of the variance in the data. 
Third panel: Plot of spectral clustering based on the first nontrivial eigenvector VqI of the 
matrix G20065 where G2006 denotes the full CONGRESS restricted to the senators from the 110 th 



Congress (which includes the years 2006 and 2007). If we let ?4u06 denote the restriction of the 
(localized) 41 st eigenvector of full CONGRESS data to the the senators in the 110 th Congress, then 
< 3 x 10 -3 , and an identical partition and plot (at the level of resultuion of this 
figure) is generated by partitioning by performing a sweep cut along ^2006- 



G2006 



,(41) 
^2006 I 



mainder of the edges are between a single individual in two consecutive Congresses, i.e., they 
are still fairly temporally-local. That is, there is some structured graph (structured depending 
on the details of the voting pattern in any particular Congress) for which the temporally-local 
connections are reasonably strong (assuming that the connection parameter between individu- 
als in successive Congresses is not extremely small or extremely large) that is "evolving" along 
a one-dimensional temporal scaffolding. Thus, if one "zooms in" and looks locally at a single 
Congress, then the properties of that Congress should be apparent. For example, the best parti- 
tion computed from a spectral clustering algorithm for any single Congress is typically strongly 
correlated with party affiliation [26l |38l |22] . On the other hand, if one "zooms out" and looks at 
the entire graph, then the linear time series structure should be apparent and the properties of 
any single Congress should be less important. For example, the best partition computed from a 
spectral clustering algorithm for the entire data set split the data into the first temporal half and 
the second temporal half and thus fails to see party affiliations. 

In cases such as this, where there are two different "size scales" to the interactions, a zero-th 
order model for the data may be given by the following tensor product structure. Let W be 
a "base graph" representing the structure of "local" interactions at local or small size scales. 
For example, this could be a simple model for the voting patterns within a single Congress; 
or this could represent the inter-county migration patterns within a single state or geopolitical 
region. In addition, let N be an "interaction model" that governs the "global" interaction between 
different base graphs W. For example, this could be a "banded" or "tridiagonal" matrix, in which 
the nonzero components above and below the diagonal represent the connection links between 
two Congresses at adjacent time steps; or this could be a discretization of a low-dimensional 
manifold representing the geographical connections in a nation, if spatially-local couplings are 
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Figure 6: The Migration data: pictorial illustration of several of the eigenfunctions. Shown are 
the top eigenfunctions and several of the lower-order eigenfunctions that exhibit varying degrees 
of localization. 

most important; or this could even be a more general noise model in which edges are added 
randomly between every pair of nodes (if, e.g., the connections between different base graphs are 
much less structured, as in social and information networks [17]). Then, a simple a zero-th order 
model, which we will denote the TwoLevel model, is given by 

G H + N , where 
H = J®W, 

where / is the identity matrix and H = I ® W denotes the tensor product between / and W . 

In what follows, we will illustrate the properties of the TwoLevel model in several idealized 
settings. To do so, we will consider the base graph W to be either "structured" or "unstructured," 
and we will also consider the interaction model N to be either "structured" or "unstructured." 

• For the base graph, W, we will model the unstructured case by a single unstructured Erdos- 
Renyi random graph [6], G npi on some number n of nodes, where the connection probability 
between each pair of nodes is p\ and we will model the structured case by a so-called 2- 
module. By a "2-module," we mean two Erdos-Renyi random graphs, where intra-module 
nodes are randomly connected with probability p\ and inter-module nodes are connected 
with some much lower probability p2- (This 2-module is structured in the sense that the 



11 




-0.02 0.02 



-0.02 0.02 



ill jjL 

-0.06 -0.04 -0.02 0.02 -0.03 -0.02 -0.01 0.01 0.02 0.03 0.04 0.05 





l ,jL 

,UuJJHI III! Ill III Km. . 



0.15 -0.1 -0.05 0.05 0.1 0.15 




Figure 7: The Migration data: histograms of the entries of the eigenfunctions that were shown 
in Figure [6j clearly indicating localization on some of the low-order eigenfunctions. 

top eigenvector of the 2-module graph is the Fiedler vector that would clearly separate the 
two modules.) 

• For the interaction model, TV, we will model structured noise as a "path graph," i.e., a 
tree with two or more vertices that is not branched at all and which thus has a "banded" 
adjacency matrix; and we will model unstructured noise by randomly connecting any two 
nodes in different modules with some small probability, i.e., by an Erdos-Renyi random 
graph with some small connection probability p. 

Clearly, for both the base graph and for the interaction model, these are limiting cases. For 
example, rather than consider a 2-module as the base graph, one could consider a 3-module to 
model the existence of a good tri-partition of the base graph, a 4-module, etc. Similarly, rather 
than just considering interactions along a one-dimensional scaffolding, one could consider it along 
a two-dimensional scaffolding, etc. Unpublished empirical results indicate that, for both the base 
graph and for the interaction model, by considering these weaker forms of structure (in particular, 
3-modules rather than 2-modules or a two-dimensional scaffolding rather than a one-dimensional 
scaffolding), we obtain results that are similar to but intermediate between the structured and 
unstructured results that we report below. Formalizing this more generally and understanding 
the theoretical and empirical implications of perturbations of tensor product matrices is an open 
problem raised by our observations. 
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4.2 Empirical properties of the TwoLevel model 

Here, we will examine the behavior of the TwoLevel model for various combinations of struc- 
tured and unstructured graphs for the base graph and the the interaction model. Our goal will 
be to reproduce qualitatively some of the properties we observed in Section [3] and to understand 
their behavior in terms of the parameters of the TwoLevel model. 

To begin, Figure [8] illustrates a graph consisting of several hundred nodes organized as a "path 
graph of 2-modules"; that is, it consists of five 2-modules connected together as beads along a 
one-dimensional scaffolding. (All of the figures for the behavior of the TwoLevel model contain 
a subset of: a pictorial illustration of the graph in the form of a "spy" plot; the IPR scores, as 
a function of the rank of the eigenvector; a barplot of the normalized square spectrum; plots of 
several of the eigenvectors; and the corresponding statistical leverage scores. Figure [8] plots all of 
these quantities.) The first four nontrivial eigenvectors in Figure [8] are fairly constant along each 
of the beads; and they exhibit the characteristic sinusoidal oscillations that one would expect 
from eigenfunctions of the Laplacian on the continuous line or a discrete path graph. The next 
five eigenvectors are much more localized; and they tend to be localized either on a single bead 
at the endpoints of the path or on a small number of nearby beads in the middle of the path. 
In addition, on the fifth and sixth eigenfunction, which are localized on a single 2-module, there 
is a natural partition of that 2-module based on the sign of that eigenvector, and that partition 
splits the 2-module into the two separate modules. Later eigenvectors are still more localized than 
leading-order eigenvectors, at least by the IPR measure, but they do not seem to be localized in 
such a way as to yield insight into the data. 

Next, Figures [9] and 10 present the same results for two modifications of this basic setup. 



Figure [9] does it for an "unstructured graph of 2-modules," i.e., for five 2-modules connected 
with random interactions. In this case, low-order eigenvector localization is still present, but it is 
much less prominent by the IPR measure, and it is significantly more noisy when the eigenvectors 
themselves are visualized. Also, and not surprisingly, the situation becomes noisier still if the off- 
diagonal noise is increased. Figure [10] presents results for a "path graph of unstructured graphs," 
i.e., several random unstructured graphs organized as beads along a one-dimensional scaffolding. 
Again, low-order eigenvector localization is still present, but again the situation is significantly 
more noisy. Note, though, that although the localization does not lead to most of the mass on 
low-order eigenvectors being localized on a single bead, there is still a tendency for localization 
to occur at the endpoints of the path. 

Finally, in order to understand the effect of varying the structure of the base modules on the 



localization properties of the eigenvectors, Figures [TT] and [12] illustrate the situation when the 
beads of the path graph are of two different types: unstructured Erdos-Renyi random graph (to be 
denoted by "E"); and structured 2-modules (to be denoted by "2"). Combining beads in this way 
is of interest since may be thought of as a zero-th order model of, e.g., a more-or-less polarized 
Congress. The former figure illustrates the case when most of the beads are unstructured (in 
the order EE2E2), while the latter illustrates the case when most of the beads are structured 
and a few are less-structured (in the order 22E2E). For the EE2E2 situation, the low-order 
eigenvectors highlight the two relatively more-structured 2-modules, starting with the one at the 
endpoint, although there is some residual structure highlighted by low-order eigenvectors on the 
unstructured E beads. Conversely, for the 22E2E case, the 2-modules tend to be highlighted; 
the E beads tend to be lost, but they do tend to make the localization on nearby 2-modules less 
pronounced. 
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Figure 8: Results from the TwoLevel model, where the parameters have been set as a "path 
graph of 2-modules," with edge densities p\ = 0.8, p2 — 0.2, and where a pair of nodes from 
consecutive 2-modules are connected with probability p = 0.05. Top left is a pictorial illustration 
of the graph in the form of a "spy" plot. Top middle is the IPR scores, as a function of the rank 



of the eigenvector. Top right is a barplot of the normalized square spectrum, z.e., 



2^3 = 1 A 7 



for 



i = 1, . . . , 65. Next two rows are the top 12 eigenvectors. Last two rows are the corresponding 
statistical leverage scores. 



4.3 Theoretical considerations 

The empirical results on the TwoLevel model demonstrate that a very simple tensor product 
construction can shed light on some of the empirical observations for the CONGRESS data and the 
Migration data that were made in Section |3j More generally, the TwoLevel model may be 
used as a diagnostic tool to help extract insight that is useful for a downstream analyst from data 
graphs when such low-order eigenvector localization is present. To help gain insight into "why" 
our empirical observations hold, here we will provide some insight that is guided by theory. A 
detailed theoretical understanding of the TwoLevel model is beyond the scope of this paper, as 
it would require a matrix perturbation analysis of the tensor product of structured matrices. This 
is a technically-involved topic, in part since a straightforward application of matrix perturbation 
ideas tends to "wash out" the bottom part of the spectrum [31] . 

Instead of attempting to provide this, we will illustrate how many of the empirical results can 
be "understood" as a consequence of several rules-of-thumb that are well-known to practitioners 
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Figure 9: Results from the TwoLevel model, where the parameters have been set as a "unstruc- 
tured graph of 2-modules," where each 2-module W has edge densities p\ = 0.8 and P2 = 0.2, and 
where the unstructured N is a random grpah with p = 0.02. Shown are: a pictorial illustration 
of the graph; the IPR scores; the normalized square spectrum; and the top 10 eigenvectors. 

of eigenvector-based machine learning and data analysis tools. 

• First, recall that tensor product constructions lead to separable eigenstates. In particular, 
for the TwoLevel model, the spectrum of H is related in a simple way to those of / and 
W: its eigenvalues are just the direct products of the eigenvalues of / and W, and the 
corresponding eigenvectors of W are the tensor products of the eigenvectors of / and W. 
For example, if 1 and v\ are the top eigenvalue/eigenvector of /, and 5 and u\ of W, then 
the eigenvector of W corresponding to the top eigenvalue 1 (8)5 = 5 is (vi) (g) (^i); and so on. 
Assuming that the perturbation caused by the interaction model N is "sufficiently weak" 
relative to the base graph W, this suggests two things: first, that the top eigenvectors of 
the full graph will not "see" the internal structure of the base graph W; second, that the 
number of these top eigenvectors will equal the number of base graphs (minus one, if the 
trivial eigenvector is not counted); and third, that properties of the eigenvectors of the base 
graph W may manifest themselves in subsequent low-order eigenvectors of the full graph. 
All of these phenomena are clearly observed in the empirical results for the TwoLevel, as 
well as for the CONGRESS data when the inter-Congress couplings are small to moderate. 
When the inter- Congress couplings become larger, the interaction model is less weak, in 
which case the situation is much noisier and more complex. Similarly, for the Migration 
data, there is some geographically-local structure illustrated in the low-order eigenvectors, 
but the situation is much noisier, suggesting that the interaction model N is more complex 
or that a simple separation of scales in a tensor product construction is less appropriate for 
these data. 

• Second, recall that eigenvectors have strong connections with diffusions. For example, the 
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Figure 10: Results from the TwoLevel model, where the parameters have been set as an "path 
graph of unstructured graphs," where each base graph W is a random graph G(n = 100, p = 0.2), 
and where a pair of nodes from consecutive base graphs are connected with probability 0.01. 
Shown are: a pictorial illustration of the graph; the IPR scores; the normalized square spectrum; 
and the top 10 eigenvectors. 

power method can be used to compute the top eigenvector of certain matrices, and random 
walks can be used to compute vectors which find good partitions of the data. Empirical 
results on the TwoLevel data illustrate that when the base graph W is structured (a 
2-module with a good bipartition, as opposed to an unstructured random graph) and/or 
when the interaction model N is structured (a path graph, as opposed to an unstructured 
random graph) then the low-order localization is most pronounced. (This may be seen as 
a consequence of the implicit "isoperimetric capacity control" associated with diffusing in 
very low-dimension spaces or when there are very good bipartitions of the data. Formalizing 
these trade-offs would provide a precise but nontrivial sense in which perturbation caused 
by the interaction model N is "sufficiently weak" relative to the base graph W.) Relatedly, 
below the localization-delocalization transition, there is a fairly strong tendency in the 
CONGRESS data for localization to occur on very early Congresses or very late Congresses, 
i.e., at early or late but not at intermediate times. A similar but somewhat weaker tendency 
is seen for localization to occur in the Migration data at the boundaries or geographic 
borders of the data, suggesting that an explanation for this has to do with random walks 
"getting stuck" at "corners" of the configuration space. Relatedly, in the CONGRESS data, 
on low-order eigenvectors for which the localization is somewhat less pronounced, there is 
often but not always substantial mass on several temporally- adjacent Congresses. 

• Third, recall that higher-variance eigenvectors occur earlier in the spectrum. As a conse- 
quence of this, the conventional wisdom is that the top eigenvector is relatively smooth and 
that subsequent eigenvectors exhibit characteristic higher- frequency sinusoidal oscillations; 
and, indeed, this is observed in both the real and synthetic data. More interestingly, one 
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Figure 11: Results from the TwoLevel model, with two different types base graphs organized 
as a path graph; the order of the base graphs is EE2E2. Each "2" has edge densities p\ = 0.8 
and p2 = 0.2; each "E" is a random graph with p to match the edge densities inside the beads; 
and nodes between successive beads are connected with probability 0.05. Shown are: a pictorial 
illustration of the graph; the IPR scores; the normalized square spectrum; and the 5 th through 
9 th eigenvectors. 



should observe that in the CONGRESS data, eigenvectors with localization on recent, i.e., 
temporally-later, Congresses tend to occur earlier in the spectrum, i.e., account for a larger 
fraction of the variance, than eigenvectors with localization on much older Congresses. An 
explanation for this is given by the observation that more recent Congresses are substantially 
more "polarized" than earlier Congresses [26j EHl [22] . Since the variance associated with a 
more polarized base graph should be larger than that associated with a less polarized base 
graph, one would expect that (assuming that eigenvectors with localization on both earlier 
and on later Congresses are observed in the data) eigenvectors with localization on recent 
(and thus more polarized) Congresses should be seen before eigenvectors with localization 
on older (and less polarized) Congresses. This explanation is given clear support by consid- 
ering the order in which localized low-order eigenvectors appear when more-structured and 



less-structured base graphs are combined; see Figures [TT| and 12 



• Fourth, recall that lower-order eigenvectors are exactly orthogonal to earlier eigenvectors. 
Since the requirement of exact orthogonality is typically unrelated to the processes gen- 
erating the data, this often manifests itself in denser eigenvectors that often have weaker 
localization properties and that are largely uninterpretable in terms of the domain from 
which the data are drawn. This is the conventional wisdom, and (although not presented 
pictorially) this is also seen in some of the lower-order eigenvectors in the data sets we have 
been discussing. 

Although these rule-of-thumb principles do not explain everything that a rigorous perturbation 
analysis of the tensor product of structured matrices might hope to provide, they do help to 
understand many of the observed empirical results that are seemingly arbitrary or simply artifacts 
of noise in the data. In addition, they can be used to understand the properties of eigenvector- 
based methods more generally. As a trivial example, recall that the CONGRESS data from Section|3] 
was for a time period when the number of U.S. states and thus U.S. senators did not change 



17 




Figure 12: Results from the TwoLevel model, with two different types base graphs organized 
as a path graph; the order of the base graphs is 22E2E. Each "2" has edge densities p\ = 0.8 
and p2 = 0.2; each "E" is a random graph with p to match the edge densities inside the beads; 
and nodes between successive beads are connected with probability 0.05. Shown are: a pictorial 
illustration of the graph; the IPR scores; the normalized square spectrum; and the 5 th through 
9 th eigenvectors. 

substantially and thus when the size of the Congress was roughly constant, suggesting that fixed- 
sized beads evolving along a one-dimensional scaffolding might be appropriate. If, instead, one 
was interested in using eigenvector-based methods to examine Congressional voting data from 
1789 to the present [261 EH1 E2], then one must take into account that the number of senators 
changed substantially over time. In this case, an "ice cream cone" model, where the beads along 
the one-dimensional scaffolding grow in size with time, would be more appropriate. 

5 Discussion and conclusion 

We have investigated the phenomenon of low-order eigenvector localization in Laplacian matrices 
associated with data graphs. Our contributions are threefold: first, we have introduced the 
notion of low-order eigenvector localization; second, we have described several examples of this 
phenomenon in two real data sets, illustrating that the localization can in some cases highlight 
meaningful structural heterogeneities in the data that are of potential interest to a downstream 
analyst; and third, we have presented a very simple model that qualitatively reproduces several 
of the empirical observations. Our model is a very simple two-level tensor product construction, 
in which each level can be "structured" or "unstructured." Although simple, this model suggests 
certain structural similarities among the seemingly-unrelated applications where we have observed 
low-order eigenvector localization, and it may be used as a diagnostic tool to help extract insight 
from data graphs when such low-order eigenvector localization is present. At this point, our model 
is mostly "descriptive," in that it can be used to describe or rationalize empirical observations. 
We will conclude this paper with a discussion of our results in a more general context. 

Recall that the idea behind nonlinear dimensionality reduction methods such as Laplacian 
eigenmaps [4] and the related diffusion maps [9] is to use eigenvectors of a Laplacian matrix cor- 
responding to the coarsest modes of variation in the data matrix to construct a low-dimensional 
representation of the data. The embedding provided by these top eigenvectors is is often inter- 
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preted in terms of an underlying low-dimensional manifold that is "nice," e.g., that does not have 
pathological curvature properties or other pathological distributional properties that would lead 
to structural heterogeneities that would lead to eigenvector localization; and this embedding is 
used to perform tasks such as classification, clustering, and regression. Our results illustrate that 
meaningful low-variance information will often be lost with such an approach. Of course, there 
is no reason that general data graphs should look like limiting discretizations of nice manifolds, 
but it has been our experience that the empirical results we have reported are very surprising to 
practitioners of eigenvector-based machine learning and data analysis methods. 

Far from being exotic or rare, however, a two-level structure such as that posited by our 
TwoLevel model is quite common — e.g., time series data have a natural one-dimensional tem- 
poral ordering, DNA single-nucleotide polymorphism data are ordered along a one-dimensional 
chromosome along which there is correlational or linkage disequilibrium structure, and hyper- 
spectral data in the natural sciences have a natural ordering associated with the frequency. Not 
surprisingly, then, we have observed similar qualitative properties to those we have reported here 
on several of these other types of data sets, and we expect observations similar to those we have 
made to be made in many other applications. 

In some cases, low-order eigenvector localization has similarities with localization on extremal 
eigenvectors. In general, though, drawing this connection is rather tricky, especially if one is inter- 
ested in extracting insight or performing machine learning when low-order eigenvector localization 
is present. Thus, a number of rather pressing questions are raised by our observations. An obvious 
direction has to do with characterizing more broadly the manner in which such localization occurs 
in practice. It is of particular interest to understand how it is affected by smoothing and prepro- 
cessing decisions that are made early in the data analysis pipeline. A second obvious direction 
has to do with providing a firmer theoretical understanding of low-order localization. This will 
require a matrix perturbation analysis of the tensor product of structured matrices, which to the 
best of our knowledge has not been considered yet in the literature. This is a technically-involved 
topic, in part since a straightforward application of matrix perturbation ideas tends to "wash 
out" the bottom part of the spectrum. A third direction has to do with understanding the rela- 
tionship between the low-order localization phenomenon we have reported and recently-developed 
local spectral methods that implicitly construct local versions of eigenvectors [32l O [18] . A final 
direction that is clearly of interest has to do with understanding the implications of our empirical 
observations on the applicability of popular eigenvector-based machine learning and data analysis 
tools. 

Acknowledgments: We would like to acknowledge SAMSI and thank the members of its 
2010-2011 Geometrical Methods and Spectral Analysis Working Group for helpful discussions. 
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